Project - MovieLens Data Analysis

The GroupLens Research Project is a research group in the Department of Computer Science and Engineering at the University of Minnesota. The data is widely used for collaborative filtering and other filtering solutions. However, we will be using this data to act as a means to demonstrate our skill in using Python to “play” with data.

Domain

Internet and Entertainment

Note that the project will need you to apply the concepts of groupby and merging extensively.

In [ ]:
 

1. Import the necessary packages -

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plot
import seaborn as sns
sns.set(color_codes=True)
%matplotlib inline 

2. Read the 3 datasets into dataframes - 2.5

In [6]:
userrating = pd.read_csv('data.csv')
moviegenre = pd.read_csv('item.csv')
userinfo = pd.read_csv('user.csv')

3. Apply info, shape, describe, and find the number of missing values in the data - 5

In [7]:
###### 'userrating' dataframe number of columns
userrating.count()
Out[7]:
user id      100000
movie id     100000
rating       100000
timestamp    100000
dtype: int64
In [8]:
###### viewing first couple of rows
userrating.head()
Out[8]:
user id movie id rating timestamp
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596
In [9]:
###### Lets examine 'userrating' dataframe
userrating.info()
userrating.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column     Non-Null Count   Dtype
---  ------     --------------   -----
 0   user id    100000 non-null  int64
 1   movie id   100000 non-null  int64
 2   rating     100000 non-null  int64
 3   timestamp  100000 non-null  int64
dtypes: int64(4)
memory usage: 3.1 MB
Out[9]:
user id movie id rating timestamp
count 100000.00000 100000.000000 100000.000000 1.000000e+05
mean 462.48475 425.530130 3.529860 8.835289e+08
std 266.61442 330.798356 1.125674 5.343856e+06
min 1.00000 1.000000 1.000000 8.747247e+08
25% 254.00000 175.000000 3.000000 8.794487e+08
50% 447.00000 322.000000 4.000000 8.828269e+08
75% 682.00000 631.000000 4.000000 8.882600e+08
max 943.00000 1682.000000 5.000000 8.932866e+08

from the above observations we get that userrating contains only int64 type values. we get a nice five point summary of data in userrating dataframe, though only the values for 'rating' column make sense.

In [10]:
###### userrating shape
userrating.shape
Out[10]:
(100000, 4)
In [11]:
###### examine if userrating has any null coloumns
userrating.isnull().values.any()
Out[11]:
False
In [12]:
userrating.isnull().sum()
Out[12]:
user id      0
movie id     0
rating       0
timestamp    0
dtype: int64

From the above observations of user rating we find that it has 100000 user ratings and none of the rows has null vallues.

In [13]:
###### Now lets examine the moviegenre dataframe like we did for userrating. 
moviegenre.count()
Out[13]:
movie id        1681
movie title     1681
release date    1681
unknown         1681
Action          1681
Adventure       1681
Animation       1681
Childrens       1681
Comedy          1681
Crime           1681
Documentary     1681
Drama           1681
Fantasy         1681
Film-Noir       1681
Horror          1681
Musical         1681
Mystery         1681
Romance         1681
Sci-Fi          1681
Thriller        1681
War             1681
Western         1681
dtype: int64
In [14]:
moviegenre.shape
Out[14]:
(1681, 22)
In [15]:
moviegenre.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1681 entries, 0 to 1680
Data columns (total 22 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   movie id      1681 non-null   int64 
 1   movie title   1681 non-null   object
 2   release date  1681 non-null   object
 3   unknown       1681 non-null   int64 
 4   Action        1681 non-null   int64 
 5   Adventure     1681 non-null   int64 
 6   Animation     1681 non-null   int64 
 7   Childrens     1681 non-null   int64 
 8   Comedy        1681 non-null   int64 
 9   Crime         1681 non-null   int64 
 10  Documentary   1681 non-null   int64 
 11  Drama         1681 non-null   int64 
 12  Fantasy       1681 non-null   int64 
 13  Film-Noir     1681 non-null   int64 
 14  Horror        1681 non-null   int64 
 15  Musical       1681 non-null   int64 
 16  Mystery       1681 non-null   int64 
 17  Romance       1681 non-null   int64 
 18  Sci-Fi        1681 non-null   int64 
 19  Thriller      1681 non-null   int64 
 20  War           1681 non-null   int64 
 21  Western       1681 non-null   int64 
dtypes: int64(20), object(2)
memory usage: 289.0+ KB
In [16]:
moviegenre.head()
Out[16]:
movie id movie title release date unknown Action Adventure Animation Childrens Comedy Crime ... Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
0 1 Toy Story 01-Jan-1995 0 0 0 1 1 1 0 ... 0 0 0 0 0 0 0 0 0 0
1 2 GoldenEye 01-Jan-1995 0 1 1 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
2 3 Four Rooms 01-Jan-1995 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
3 4 Get Shorty 01-Jan-1995 0 1 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
4 5 Copycat 01-Jan-1995 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 1 0 0

5 rows × 22 columns

In [17]:
moviegenre.describe()
Out[17]:
movie id unknown Action Adventure Animation Childrens Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
count 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000 1681.000000
mean 841.841761 0.000595 0.149316 0.080309 0.024985 0.072576 0.300416 0.064842 0.029744 0.431291 0.013087 0.014277 0.054729 0.033314 0.036288 0.146936 0.060083 0.149316 0.042237 0.016062
std 485.638077 0.024390 0.356506 0.271852 0.156126 0.259516 0.458576 0.246321 0.169931 0.495404 0.113683 0.118667 0.227519 0.179507 0.187061 0.354148 0.237712 0.356506 0.201189 0.125751
min 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 422.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 842.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 1262.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 1682.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
In [18]:
moviegenre.isnull().values.any() 
Out[18]:
False
In [19]:
moviegenre.isnull().sum()
Out[19]:
movie id        0
movie title     0
release date    0
unknown         0
Action          0
Adventure       0
Animation       0
Childrens       0
Comedy          0
Crime           0
Documentary     0
Drama           0
Fantasy         0
Film-Noir       0
Horror          0
Musical         0
Mystery         0
Romance         0
Sci-Fi          0
Thriller        0
War             0
Western         0
dtype: int64
In [20]:
###### Now lets examine the userinfo dataframe
userinfo.count()
Out[20]:
user id       943
age           943
gender        943
occupation    943
zip code      943
dtype: int64
In [21]:
userinfo.shape
Out[21]:
(943, 5)
In [22]:
userinfo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 943 entries, 0 to 942
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user id     943 non-null    int64 
 1   age         943 non-null    int64 
 2   gender      943 non-null    object
 3   occupation  943 non-null    object
 4   zip code    943 non-null    object
dtypes: int64(2), object(3)
memory usage: 37.0+ KB
In [23]:
userinfo.head()
Out[23]:
user id age gender occupation zip code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
In [24]:
userinfo.describe()
Out[24]:
user id age
count 943.000000 943.000000
mean 472.000000 34.051962
std 272.364951 12.192740
min 1.000000 7.000000
25% 236.500000 25.000000
50% 472.000000 31.000000
75% 707.500000 43.000000
max 943.000000 73.000000
In [25]:
userinfo.isnull().values.any()
Out[25]:
False
In [26]:
userinfo.isnull().sum()
Out[26]:
user id       0
age           0
gender        0
occupation    0
zip code      0
dtype: int64

4. Find the number of movies per genre using the item data - 5

In [27]:
# moviegenre is our dataframe for item data.
genre_list = ['unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy',
              'Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western']
genre_count_dict = { 'count of geners': [ moviegenre['unknown'].value_counts()[1], 
                    moviegenre['Action'].value_counts()[1], 
                    moviegenre['Adventure'].value_counts()[1], 
                    moviegenre['Animation'].value_counts()[1], 
                    moviegenre['Childrens'].value_counts()[1], 
                    moviegenre['Comedy'].value_counts()[1], 
                    moviegenre['Crime'].value_counts()[1], 
                    moviegenre['Documentary'].value_counts()[1], 
                    moviegenre['Drama'].value_counts()[1], 
                    moviegenre['Fantasy'].value_counts()[1], 
                    moviegenre['Film-Noir'].value_counts()[1], 
                    moviegenre['Horror'].value_counts()[1], 
                    moviegenre['Musical'].value_counts()[1], 
                    moviegenre['Mystery'].value_counts()[1], 
                    moviegenre['Romance'].value_counts()[1], 
                    moviegenre['Sci-Fi'].value_counts()[1], 
                    moviegenre['Thriller'].value_counts()[1], 
                    moviegenre['War'].value_counts()[1], 
                    moviegenre['Western'].value_counts()[1] ]}
genre_count_df = pd.DataFrame( genre_count_dict, index=genre_list)
print(genre_count_df)
             count of geners
unknown                    1
Action                   251
Adventure                135
Animation                 42
Childrens                122
Comedy                   505
Crime                    109
Documentary               50
Drama                    725
Fantasy                   22
Film-Noir                 24
Horror                    92
Musical                   56
Mystery                   61
Romance                  247
Sci-Fi                   101
Thriller                 251
War                       71
Western                   27

5. Find the movies that have more than one genre - 2.5

In [28]:
#hint: use sum on the axis = 1
for i in range(0, len(moviegenre)):
    if (moviegenre.loc[i,'unknown':].sum() > 1):
            print("\"{0}\" is in {1} genres".format(moviegenre.iloc[i][1] , moviegenre.loc[i,'unknown':].sum()))
    
"Toy Story " is in 3 genres
"GoldenEye " is in 3 genres
"Get Shorty " is in 3 genres
"Copycat " is in 3 genres
"Twelve Monkeys " is in 2 genres
"Babe " is in 3 genres
"Richard III " is in 2 genres
"Seven (Se7en) " is in 2 genres
"Usual Suspects, The " is in 2 genres
"Postino, Il " is in 2 genres
"French Twist (Gazon maudit) " is in 2 genres
"From Dusk Till Dawn " is in 5 genres
"Angels and Insects " is in 2 genres
"Muppet Treasure Island " is in 5 genres
"Braveheart " is in 3 genres
"Taxi Driver " is in 2 genres
"Rumble in the Bronx " is in 3 genres
"Apollo 13 " is in 3 genres
"Batman Forever " is in 4 genres
"Crimson Tide " is in 3 genres
"Desperado " is in 3 genres
"Doom Generation, The " is in 2 genres
"Free Willy 2: The Adventure Home " is in 3 genres
"Mad Love " is in 2 genres
"Net, The " is in 2 genres
"Strange Days " is in 3 genres
"Disclosure " is in 2 genres
"Dolores Claiborne " is in 2 genres
"Eat Drink Man Woman " is in 2 genres
"Ed Wood " is in 2 genres
"I.Q. " is in 2 genres
"Star Wars " is in 5 genres
"Legends of the Fall " is in 4 genres
"Natural Born Killers " is in 2 genres
"Outbreak " is in 3 genres
"Professional, The " is in 4 genres
"Pulp Fiction " is in 2 genres
"Stargate " is in 3 genres
"Santa Clause, The " is in 2 genres
"What's Eating Gilbert Grape " is in 2 genres
"While You Were Sleeping " is in 2 genres
"Crow, The " is in 3 genres
"Forrest Gump " is in 3 genres
"Four Weddings and a Funeral " is in 2 genres
"Lion King, The " is in 3 genres
"Mask, The " is in 3 genres
"Maverick " is in 3 genres
"Faster Pussycat! Kill! Kill! " is in 3 genres
"Carlito's Way " is in 2 genres
"Firm, The " is in 2 genres
"Free Willy " is in 3 genres
"Fugitive, The " is in 2 genres
"Hot Shots! Part Deux " is in 3 genres
"Hudsucker Proxy, The " is in 2 genres
"Jurassic Park " is in 3 genres
"Much Ado About Nothing " is in 2 genres
"Robert A. Heinlein's The Puppet Masters " is in 2 genres
"Sleepless in Seattle " is in 2 genres
"Blade Runner " is in 2 genres
"So I Married an Axe Murderer " is in 3 genres
"Nightmare Before Christmas, The " is in 3 genres
"True Romance " is in 3 genres
"Welcome to the Dollhouse " is in 2 genres
"Home Alone " is in 2 genres
"Aladdin " is in 4 genres
"Terminator 2: Judgment Day " is in 3 genres
"Dances with Wolves " is in 3 genres
"Silence of the Lambs, The " is in 2 genres
"Snow White and the Seven Dwarfs " is in 3 genres
"Fargo " is in 3 genres
"Heavy Metal " is in 5 genres
"Aristocats, The " is in 2 genres
"All Dogs Go to Heaven 2 " is in 3 genres
"Diabolique " is in 2 genres
"Mystery Science Theater 3000: The Movie " is in 2 genres
"Operation Dumbo Drop " is in 4 genres
"Truth About Cats & Dogs, The " is in 2 genres
"Flipper " is in 2 genres
"Rock, The " is in 3 genres
"Twister " is in 3 genres
"Striptease " is in 2 genres
"Independence Day (ID4) " is in 3 genres
"Frighteners, The " is in 2 genres
"Lone Star " is in 2 genres
"Phenomenon " is in 2 genres
"Godfather, The " is in 3 genres
"Supercop " is in 2 genres
"Bound " is in 4 genres
"Breakfast at Tiffany's " is in 2 genres
"Wizard of Oz, The " is in 4 genres
"Gone with the Wind " is in 3 genres
"2001: A Space Odyssey " is in 4 genres
"D3: The Mighty Ducks " is in 2 genres
"Love Bug, The " is in 2 genres
"Homeward Bound: The Incredible Journey " is in 2 genres
"20,000 Leagues Under the Sea " is in 4 genres
"Bedknobs and Broomsticks " is in 3 genres
"Die Hard " is in 2 genres
"Lawnmower Man, The " is in 3 genres
"Long Kiss Goodnight, The " is in 2 genres
"Ghost and the Darkness, The " is in 2 genres
"Swingers " is in 2 genres
"Willy Wonka and the Chocolate Factory " is in 3 genres
"Sleeper " is in 2 genres
"Dirty Dancing " is in 2 genres
"Reservoir Dogs " is in 2 genres
"Platoon " is in 2 genres
"Basic Instinct " is in 2 genres
"Top Gun " is in 2 genres
"Abyss, The " is in 4 genres
"Wrong Trousers, The " is in 2 genres
"Cinema Paradiso " is in 3 genres
"Delicatessen " is in 2 genres
"Empire Strikes Back, The " is in 6 genres
"Princess Bride, The " is in 4 genres
"Raiders of the Lost Ark " is in 2 genres
"Aliens " is in 4 genres
"Good, The Bad and The Ugly, The " is in 2 genres
"Apocalypse Now " is in 2 genres
"Return of the Jedi " is in 5 genres
"GoodFellas " is in 2 genres
"Alien " is in 4 genres
"Army of Darkness " is in 5 genres
"Psycho " is in 3 genres
"Blues Brothers, The " is in 3 genres
"Godfather: Part II, The " is in 3 genres
"Full Metal Jacket " is in 3 genres
"Grand Day Out, A " is in 2 genres
"Henry V " is in 2 genres
"Amadeus " is in 2 genres
"Sting, The " is in 2 genres
"Terminator, The " is in 3 genres
"Graduate, The " is in 2 genres
"Bridge on the River Kwai, The " is in 2 genres
"Evil Dead II " is in 4 genres
"Groundhog Day " is in 2 genres
"Back to the Future " is in 2 genres
"Patton " is in 2 genres
"Akira " is in 4 genres
"Cyrano de Bergerac " is in 3 genres
"Young Frankenstein " is in 2 genres
"This Is Spinal Tap " is in 3 genres
"Indiana Jones and the Last Crusade " is in 2 genres
"M*A*S*H " is in 2 genres
"Room with a View, A " is in 2 genres
"Pink Floyd - The Wall " is in 3 genres
"When Harry Met Sally... " is in 2 genres
"Bram Stoker's Dracula " is in 2 genres
"Mirror Has Two Faces, The " is in 2 genres
"Star Trek: First Contact " is in 3 genres
"Sling Blade " is in 2 genres
"101 Dalmatians " is in 2 genres
"Die Hard 2 " is in 2 genres
"Star Trek VI: The Undiscovered Country " is in 3 genres
"Star Trek: The Wrath of Khan " is in 3 genres
"Star Trek III: The Search for Spock " is in 3 genres
"Star Trek IV: The Voyage Home " is in 3 genres
"Batman Returns " is in 4 genres
"Young Guns " is in 3 genres
"Jaws " is in 2 genres
"Mars Attacks! " is in 4 genres
"Citizen Ruth " is in 2 genres
"Jerry Maguire " is in 2 genres
"Sneakers " is in 3 genres
"Beavis and Butt-head Do America " is in 2 genres
"Last of the Mohicans, The " is in 3 genres
"Jungle2Jungle " is in 2 genres
"Smilla's Sense of Snow " is in 3 genres
"Devil's Own, The " is in 4 genres
"Chasing Amy " is in 2 genres
"Turbo: A Power Rangers Movie " is in 3 genres
"Grosse Pointe Blank " is in 2 genres
"Fifth Element, The " is in 2 genres
"Lost World: Jurassic Park, The " is in 4 genres
"Pillow Book, The " is in 2 genres
"Batman & Robin " is in 3 genres
"My Best Friend's Wedding " is in 2 genres
"When the Cats Away (Chacun cherche son chat) " is in 2 genres
"Men in Black " is in 4 genres
"Contact " is in 2 genres
"George of the Jungle " is in 2 genres
"Event Horizon " is in 4 genres
"Air Bud " is in 2 genres
"Mimic " is in 2 genres
"Hunt for Red October, The " is in 2 genres
"Kull the Conqueror " is in 2 genres
"Chasing Amy " is in 2 genres
"Gattaca " is in 3 genres
"Starship Troopers " is in 4 genres
"Heat " is in 3 genres
"Sabrina " is in 2 genres
"Sense and Sensibility " is in 2 genres
"Leaving Las Vegas " is in 2 genres
"Bed of Roses " is in 2 genres
"Up Close and Personal " is in 2 genres
"River Wild, The " is in 2 genres
"Emma " is in 2 genres
"Tin Cup " is in 2 genres
"English Patient, The " is in 3 genres
"Scream " is in 2 genres
"Evita " is in 2 genres
"Absolute Power " is in 2 genres
"Donnie Brasco " is in 2 genres
"Breakdown " is in 2 genres
"Face/Off " is in 3 genres
"Hoodlum " is in 3 genres
"Air Force One " is in 2 genres
"L.A. Confidential " is in 4 genres
"Fly Away Home " is in 2 genres
"Mrs. Brown (Her Majesty, Mrs. Brown) " is in 2 genres
"Devil's Advocate, The " is in 4 genres
"FairyTale: A True Story " is in 3 genres
"Wings of the Dove, The " is in 3 genres
"Midnight in the Garden of Good and Evil " is in 4 genres
"Titanic " is in 3 genres
"3 Ninjas: High Noon At Mega Mountain " is in 2 genres
"Apt Pupil " is in 2 genres
"As Good As It Gets " is in 2 genres
"Schindler's List " is in 2 genres
"Everyone Says I Love You " is in 3 genres
"Murder at 1600 " is in 2 genres
"Dante's Peak " is in 2 genres
"Crash " is in 2 genres
"G.I. Jane " is in 3 genres
"Cop Land " is in 3 genres
"Conspiracy Theory " is in 4 genres
"Desperate Measures " is in 3 genres
"Edge, The " is in 2 genres
"Kiss the Girls " is in 3 genres
"Game, The " is in 2 genres
"U Turn " is in 3 genres
"Playing God " is in 2 genres
"House of Yes, The " is in 3 genres
"Mad City " is in 2 genres
"Man Who Knew Too Little, The " is in 2 genres
"Alien: Resurrection " is in 3 genres
"Deconstructing Harry " is in 2 genres
"Jackie Brown " is in 2 genres
"Wag the Dog " is in 2 genres
"Desperate Measures " is in 3 genres
"Hard Rain " is in 2 genres
"Fallen " is in 3 genres
"Spice World " is in 2 genres
"Deep Rising " is in 3 genres
"Wedding Singer, The " is in 2 genres
"Sphere " is in 3 genres
"Client, The " is in 3 genres
"Spawn " is in 4 genres
"Incognito " is in 2 genres
"Blues Brothers 2000 " is in 3 genres
"Mary Reilly " is in 2 genres
"Bridges of Madison County, The " is in 2 genres
"Judge Dredd " is in 3 genres
"Mighty Morphin Power Rangers: The Movie " is in 2 genres
"Heavyweights " is in 2 genres
"Star Trek: Generations " is in 3 genres
"Muriel's Wedding " is in 2 genres
"Adventures of Priscilla, Queen of the Desert, The " is in 2 genres
"Flintstones, The " is in 2 genres
"True Lies " is in 4 genres
"Beverly Hills Cop III " is in 2 genres
"Black Beauty " is in 2 genres
"Last Action Hero " is in 2 genres
"Radioland Murders " is in 3 genres
"Serial Mom " is in 3 genres
"Super Mario Bros. " is in 4 genres
"Three Musketeers, The " is in 3 genres
"Little Rascals, The " is in 2 genres
"Ghost " is in 3 genres
"Batman " is in 4 genres
"Pinocchio " is in 2 genres
"Mission: Impossible " is in 3 genres
"Thinner " is in 2 genres
"Close Shave, A " is in 3 genres
"Jack " is in 2 genres
"Nutty Professor, The " is in 4 genres
"Apple Dumpling Gang, The " is in 3 genres
"Old Yeller " is in 2 genres
"Parent Trap, The " is in 2 genres
"Cinderella " is in 3 genres
"Mary Poppins " is in 3 genres
"Alice in Wonderland " is in 3 genres
"William Shakespeare's Romeo and Juliet " is in 2 genres
"Aladdin and the King of Thieves " is in 3 genres
"E.T. the Extra-Terrestrial " is in 4 genres
"Transformers: The Movie, The " is in 6 genres
"Day the Earth Stood Still, The " is in 2 genres
"Duck Soup " is in 2 genres
"Highlander " is in 2 genres
"Fantasia " is in 3 genres
"Butch Cassidy and the Sundance Kid " is in 3 genres
"Blob, The " is in 2 genres
"Star Trek: The Motion Picture " is in 3 genres
"Star Trek V: The Final Frontier " is in 3 genres
"Grease " is in 3 genres
"Jaws 2 " is in 2 genres
"Jaws 3-D " is in 2 genres
"Beverly Hills Ninja " is in 2 genres
"Free Willy 3: The Rescue " is in 3 genres
"Like Water For Chocolate (Como agua para chocolate) " is in 2 genres
"Jungle Book, The " is in 3 genres
"Courage Under Fire " is in 2 genres
"Dragonheart " is in 3 genres
"James and the Giant Peach " is in 3 genres
"Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb " is in 2 genres
"Matilda " is in 2 genres
"Philadelphia Story, The " is in 2 genres
"Vertigo " is in 2 genres
"North by Northwest " is in 2 genres
"Apartment, The " is in 2 genres
"Some Like It Hot " is in 2 genres
"Casablanca " is in 3 genres
"Maltese Falcon, The " is in 2 genres
"My Fair Lady " is in 2 genres
"Sabrina " is in 2 genres
"Roman Holiday " is in 2 genres
"Notorious " is in 3 genres
"To Catch a Thief " is in 3 genres
"Adventures of Robin Hood, The " is in 2 genres
"Around the World in 80 Days " is in 2 genres
"African Queen, The " is in 4 genres
"Fly Away Home " is in 2 genres
"Dumbo " is in 3 genres
"Bananas " is in 2 genres
"Bonnie and Clyde " is in 2 genres
"Dial M for Murder " is in 2 genres
"Magnificent Seven, The " is in 3 genres
"Lawrence of Arabia " is in 2 genres
"Wings of Desire " is in 3 genres
"Third Man, The " is in 2 genres
"Annie Hall " is in 2 genres
"Boot, Das " is in 3 genres
"Manhattan " is in 3 genres
"Great Escape, The " is in 2 genres
"Deer Hunter, The " is in 2 genres
"Down by Law " is in 2 genres
"Cool Hand Luke " is in 2 genres
"Big Sleep, The " is in 2 genres
"Ben-Hur " is in 3 genres
"Killing Fields, The " is in 2 genres
"Shine " is in 2 genres
"Addicted to Love " is in 2 genres
"Anastasia " is in 3 genres
"Mouse Hunt " is in 2 genres
"Mortal Kombat " is in 2 genres
"Pocahontas " is in 4 genres
"Misérables, Les " is in 2 genres
"Things to Do in Denver when You're Dead " is in 3 genres
"Vampire in Brooklyn " is in 2 genres
"Broken Arrow " is in 2 genres
"NeverEnding Story III, The " is in 2 genres
"Rob Roy " is in 3 genres
"Die Hard: With a Vengeance " is in 2 genres
"Walk in the Clouds, A " is in 2 genres
"Waterworld " is in 2 genres
"Farinelli: il castrato " is in 2 genres
"Heavenly Creatures " is in 3 genres
"Interview with the Vampire " is in 2 genres
"Kid in King Arthur's Court, A " is in 6 genres
"Mary Shelley's Frankenstein " is in 2 genres
"Quick and the Dead, The " is in 3 genres
"Tales from the Hood " is in 2 genres
"Village of the Damned " is in 2 genres
"Clear and Present Danger " is in 3 genres
"Speed " is in 3 genres
"Wolf " is in 2 genres
"Another Stakeout " is in 2 genres
"Blown Away " is in 2 genres
"Body Snatchers " is in 3 genres
"Boxing Helena " is in 3 genres
"City Slickers II: The Legend of Curly's Gold " is in 2 genres
"Cliffhanger " is in 3 genres
"Coneheads " is in 2 genres
"Demolition Man " is in 2 genres
"Englishman Who Went Up a Hill, But Came Down a Mountain, The " is in 2 genres
"Kalifornia " is in 2 genres
"Piano, The " is in 2 genres
"Romeo Is Bleeding " is in 2 genres
"Secret Garden, The " is in 2 genres
"Hour of the Pig, The " is in 2 genres
"Beauty and the Beast " is in 3 genres
"Hellraiser: Bloodline " is in 3 genres
"Primal Fear " is in 2 genres
"True Crime " is in 2 genres
"Heavy " is in 2 genres
"Hunchback of Notre Dame, The " is in 3 genres
"Eraser " is in 2 genres
"Big Squeeze, The " is in 2 genres
"For Whom the Bell Tolls " is in 2 genres
"American in Paris, An " is in 2 genres
"Rear Window " is in 2 genres
"Rebecca " is in 2 genres
"Spellbound " is in 3 genres
"Laura " is in 3 genres
"Night of the Living Dead " is in 2 genres
"Extreme Measures " is in 2 genres
"Swiss Family Robinson " is in 2 genres
"Angels in the Outfield " is in 2 genres
"Three Caballeros, The " is in 3 genres
"Sword in the Stone, The " is in 2 genres
"So Dear to My Heart " is in 2 genres
"Sleepers " is in 2 genres
"Victor/Victoria " is in 2 genres
"Great Race, The " is in 2 genres
"Crying Game, The " is in 4 genres
"Escape from New York " is in 4 genres
"Howling, The " is in 2 genres
"Paths of Glory " is in 2 genres
"Grifters, The " is in 3 genres
"The Innocent " is in 2 genres
"Ran " is in 2 genres
"Quiet Man, The " is in 2 genres
"Once Upon a Time in America " is in 3 genres
"Glory " is in 3 genres
"Rosencrantz and Guildenstern Are Dead " is in 2 genres
"Touch of Evil " is in 3 genres
"Chinatown " is in 3 genres
"Stand by Me " is in 3 genres
"M " is in 3 genres
"Manchurian Candidate, The " is in 2 genres
"Arsenic and Old Lace " is in 3 genres
"Somewhere in Time " is in 2 genres
"Alien 3 " is in 4 genres
"Blood Beach " is in 2 genres
"Body Snatchers " is in 3 genres
"Cape Fear " is in 2 genres
"Volcano " is in 2 genres
"Conan the Barbarian " is in 2 genres
"Kull the Conqueror " is in 2 genres
"I Know What You Did Last Summer " is in 3 genres
"In the Line of Fire " is in 2 genres
"Executive Decision " is in 2 genres
"Perfect World, A " is in 2 genres
"McHale's Navy " is in 2 genres
"Jackal, The " is in 2 genres
"Seven Years in Tibet " is in 2 genres
"Dark City " is in 3 genres
"American President, The " is in 3 genres
"Kicking and Screaming " is in 2 genres
"City Hall " is in 2 genres
"Barcelona " is in 2 genres
"House of the Spirits, The " is in 2 genres
"Singin' in the Rain " is in 2 genres
"Strictly Ballroom " is in 2 genres
"Tin Men " is in 2 genres
"Carrington " is in 2 genres
"To Die For " is in 2 genres
"Home for the Holidays " is in 2 genres
"Juror, The " is in 2 genres
"Canadian Bacon " is in 2 genres
"First Knight " is in 4 genres
"Boys on the Side " is in 2 genres
"Circle of Friends " is in 2 genres
"Fluke " is in 2 genres
"Immortal Beloved " is in 2 genres
"Junior " is in 2 genres
"Queen Margot (Reine Margot, La) " is in 2 genres
"Corrina, Corrina " is in 3 genres
"Dave " is in 2 genres
"Go Fish " is in 2 genres
"Shadowlands " is in 2 genres
"Sirens " is in 2 genres
"Threesome " is in 2 genres
"Pretty Woman " is in 2 genres
"Jane Eyre " is in 2 genres
"Last Supper, The " is in 2 genres
"Ransom " is in 2 genres
"Crow: City of Angels, The " is in 2 genres
"Michael Collins " is in 2 genres
"Benny & Joon " is in 2 genres
"Saint, The " is in 3 genres
"MatchMaker, The " is in 2 genres
"Tomorrow Never Dies " is in 3 genres
"Replacement Killers, The " is in 2 genres
"Red Corner " is in 2 genres
"Jumanji " is in 5 genres
"Lawnmower Man 2: Beyond Cyberspace " is in 2 genres
"Nick of Time " is in 2 genres
"If Lucy Fell " is in 2 genres
"Boomerang " is in 2 genres
"Casper " is in 2 genres
"Congo " is in 4 genres
"Devil in a Blue Dress " is in 4 genres
"Johnny Mnemonic " is in 3 genres
"Something to Talk About " is in 3 genres
"Don Juan DeMarco " is in 3 genres
"French Kiss " is in 2 genres
"Milk Money " is in 2 genres
"Beyond Bedlam " is in 2 genres
"Only You " is in 2 genres
"Perez Family, The " is in 2 genres
"Roommates " is in 2 genres
"Relative Fear " is in 2 genres
"Swimming with Sharks " is in 2 genres
"It Could Happen to You " is in 2 genres
"Richie Rich " is in 2 genres
"Speechless " is in 2 genres
"Timecop " is in 2 genres
"In the Mouth of Madness " is in 2 genres
"Hard Target " is in 4 genres
"Heaven & Earth " is in 3 genres
"Manhattan Murder Mystery " is in 2 genres
"Menace II Society " is in 3 genres
"Program, The " is in 2 genres
"Rising Sun " is in 3 genres
"Andre " is in 2 genres
"One Fine Day " is in 2 genres
"Space Jam " is in 5 genres
"Mrs. Winterbourne " is in 2 genres
"Mulholland Falls " is in 3 genres
"Arrival, The " is in 3 genres
"Daylight " is in 3 genres
"Alaska " is in 2 genres
"Fled " is in 2 genres
"Power 98 " is in 3 genres
"Escape from L.A. " is in 4 genres
"Bogus " is in 3 genres
"Halloween: The Curse of Michael Myers " is in 2 genres
"Gay Divorcee, The " is in 3 genres
"Ninotchka " is in 2 genres
"Loch Ness " is in 2 genres
"Last Man Standing " is in 3 genres
"Glimmer Man, The " is in 2 genres
"Pollyanna " is in 3 genres
"Shaggy Dog, The " is in 2 genres
"To Gillian on Her 37th Birthday " is in 2 genres
"Looking for Richard " is in 2 genres
"Murder, My Sweet " is in 2 genres
"Days of Thunder " is in 2 genres
"Bloody Child, The " is in 2 genres
"Braindead " is in 2 genres
"Bad Taste " is in 2 genres
"Diva " is in 5 genres
"Night on Earth " is in 2 genres
"April Fool's Day " is in 2 genres
"Believers, The " is in 2 genres
"Jingle All the Way " is in 3 genres
"Michael " is in 2 genres
"Fools Rush In " is in 2 genres
"Picture Perfect " is in 2 genres
"She's So Lovely " is in 2 genres
"Money Talks " is in 2 genres
"Excess Baggage " is in 2 genres
"That Darn Cat! " is in 3 genres
"Peacemaker, The " is in 3 genres
"Money Talks " is in 2 genres
"Life Less Ordinary, A " is in 2 genres
"Mortal Kombat: Annihilation " is in 2 genres
"Bent " is in 2 genres
"Flubber " is in 3 genres
"Home Alone 3 " is in 2 genres
"Scream 2 " is in 2 genres
"Time Tracers " is in 3 genres
"Big Lebowski, The " is in 4 genres
"Afterglow " is in 2 genres
"Ma vie en rose (My Life in Pink) " is in 2 genres
"Great Expectations " is in 2 genres
"Oscar & Lucinda " is in 2 genres
"Twilight " is in 2 genres
"U.S. Marshalls " is in 2 genres
"Wild Things " is in 4 genres
"Lost in Space " is in 3 genres
"Mercury Rising " is in 3 genres
"City of Lost Children, The " is in 2 genres
"Farewell My Concubine " is in 2 genres
"White Squall " is in 2 genres
"Unforgettable " is in 2 genres
"Craft, The " is in 2 genres
"Harriet the Spy " is in 2 genres
"Chain Reaction " is in 3 genres
"Island of Dr. Moreau, The " is in 2 genres
"First Kid " is in 2 genres
"Paradise Road " is in 2 genres
"Brassed Off " is in 3 genres
"Smile Like Yours, A " is in 2 genres
"Murder in the First " is in 2 genres
"With Honors " is in 2 genres
"Renaissance Man " is in 3 genres
"Charade " is in 4 genres
"Fox and the Hound, The " is in 2 genres
"Big Blue, The (Grand bleu, Le) " is in 2 genres
"Booty Call " is in 2 genres
"How to Make an American Quilt " is in 2 genres
"Indian in the Cupboard, The " is in 3 genres
"Unstrung Heroes " is in 2 genres
"Before Sunrise " is in 2 genres
"Some Folks Call It a Sling Blade " is in 2 genres
"Month by the Lake, A " is in 2 genres
"Funny Face " is in 2 genres
"Winnie the Pooh and the Blustery Day " is in 2 genres
"Mediterraneo " is in 2 genres
"Eye for an Eye " is in 2 genres
"Solo " is in 3 genres
"Heaven's Prisoners " is in 2 genres
"Trigger Effect, The " is in 2 genres
"Maximum Risk " is in 3 genres
"Beautician and the Beast, The " is in 2 genres
"Cats Don't Dance " is in 3 genres
"Anna Karenina " is in 2 genres
"Head Above Water " is in 2 genres
"Hercules " is in 5 genres
"Big Green, The " is in 2 genres
"Lightning Jack " is in 2 genres
"That Darn Cat! " is in 3 genres
"Geronimo: An American Legend " is in 2 genres
"Until the End of the World (Bis ans Ende der Welt) " is in 2 genres
"Private Parts " is in 2 genres
"Anaconda " is in 3 genres
"Shiloh " is in 2 genres
"Con Air " is in 3 genres
"Die xue shuang xiong (Killer, The) " is in 2 genres
"Gaslight " is in 2 genres
"Fire Down Below " is in 3 genres
"Lay of the Land, The " is in 2 genres
"Grumpier Old Men " is in 2 genres
"Lassie " is in 2 genres
"Little Big League " is in 2 genres
"Homeward Bound II: Lost in San Francisco " is in 2 genres
"Quest, The " is in 2 genres
"Drop Dead Fred " is in 2 genres
"Grease 2 " is in 3 genres
"Two if by Sea " is in 2 genres
"Forget Paris " is in 2 genres
"Just Cause " is in 2 genres
"Paper, The " is in 2 genres
"She's the One " is in 2 genres
"Ghost and Mrs. Muir, The " is in 2 genres
"Dracula: Dead and Loving It " is in 2 genres
"War, The " is in 2 genres
"Adventures of Pinocchio, The " is in 2 genres
"Evening Star, The " is in 2 genres
"Little Princess, A " is in 2 genres
"Crossfire " is in 2 genres
"Koyaanisqatsi " is in 2 genres
"Balto " is in 2 genres
"Amateur " is in 3 genres
"Pyromaniac's Love Story, A " is in 2 genres
"Reality Bites " is in 2 genres
"Pagemaster, The " is in 5 genres
"Oliver & Company " is in 2 genres
"Joe's Apartment " is in 2 genres
"Albino Alligator " is in 2 genres
"Carried Away " is in 2 genres
"Speed 2: Cruise Control " is in 3 genres
"Pete's Dragon " is in 4 genres
"What Happened Was... " is in 3 genres
"Six Degrees of Separation " is in 2 genres
"Two Much " is in 2 genres
"Trust " is in 2 genres
"C'est arrivé près de chez vous " is in 3 genres
"Firestorm " is in 3 genres
"Newton Boys, The " is in 2 genres
"Death and the Maiden " is in 2 genres
"Tank Girl " is in 4 genres
"Twelfth Night " is in 3 genres
"Some Kind of Wonderful " is in 2 genres
"Umbrellas of Cherbourg, The (Parapluies de Cherbourg, Les) " is in 2 genres
"They Made Me a Criminal " is in 2 genres
"Farewell to Arms, A " is in 2 genres
"Old Man and the Sea, The " is in 2 genres
"Chungking Express " is in 3 genres
"Feeling Minnesota " is in 2 genres
"Escape to Witch Mountain " is in 3 genres
"Doors, The " is in 2 genres
"Beautiful Thing " is in 2 genres
"Best Men " is in 4 genres
"Hackers " is in 3 genres
"Hard Eight " is in 2 genres
"In Love and War " is in 2 genres
"Backbeat " is in 2 genres
"Rendezvous in Paris (Rendez-vous de Paris, Les) " is in 2 genres
"Cyclo " is in 2 genres
"Stalker " is in 2 genres
"Love! Valour! Compassion! " is in 2 genres
"Palookaville " is in 2 genres
"Big Bully " is in 2 genres
"Spanking the Monkey " is in 2 genres
"Bliss " is in 2 genres
"Caught " is in 2 genres
"Welcome To Sarajevo " is in 2 genres
"I Love Trouble " is in 2 genres
"Low Down Dirty Shame, A " is in 2 genres
"Cowboy Way, The " is in 2 genres
"In the Army Now " is in 2 genres
"Inkwell, The " is in 2 genres
"Young Guns II " is in 3 genres
"That Old Feeling " is in 2 genres
"Letter From Death Row, A " is in 2 genres
"Once Were Warriors " is in 2 genres
"Family Thing, A " is in 2 genres
"Purple Noon " is in 2 genres
"Cemetery Man (Dellamorte Dellamore) " is in 2 genres
"Kim " is in 2 genres
"Top Hat " is in 3 genres
"To Be or Not to Be " is in 3 genres
"Kiss of Death " is in 3 genres
"Virtuosity " is in 2 genres
"Blue Sky " is in 2 genres
"Flesh and Bone " is in 3 genres
"Guilty as Sin " is in 3 genres
"Barb Wire " is in 2 genres
"Goofy Movie, A " is in 4 genres
"Night Falls on Manhattan " is in 2 genres
"Poison Ivy II " is in 2 genres
"Marked for Death " is in 2 genres
"Twisted " is in 2 genres
"Cutthroat Island " is in 3 genres
"Ghost in the Shell (Kokaku kidotai) " is in 2 genres
"Van, The " is in 2 genres
"Trial and Error " is in 2 genres
"Pie in the Sky " is in 2 genres
"Love in the Afternoon " is in 2 genres
"Talking About Sex " is in 2 genres
"Color of Night " is in 2 genres
"Robocop 3 " is in 2 genres
"Set It Off " is in 2 genres
"Selena " is in 2 genres
"Wild America " is in 2 genres
"Before and After " is in 2 genres
"Shall We Dance? " is in 3 genres
"Country Life " is in 2 genres
"Simple Wish, A " is in 2 genres
"Star Kid " is in 4 genres
"Kicked in the Head " is in 2 genres
"Indian Summer " is in 2 genres
"Love Affair " is in 2 genres
"Band Wagon, The " is in 2 genres
"Penny Serenade " is in 2 genres
"'Til There Was You " is in 2 genres
"New York Cop " is in 2 genres
"Babyfever " is in 2 genres
"Waiting to Exhale " is in 2 genres
"Pompatus of Love, The " is in 2 genres
"Palmetto " is in 3 genres
"Surviving the Game " is in 3 genres
"Inventing the Abbotts " is in 2 genres
"Loaded " is in 2 genres
"Midnight Dancers (Sibak) " is in 2 genres
"Kazaam " is in 3 genres
"Stefano Quantestorie " is in 2 genres
"For the Moment " is in 2 genres
"Johnny 100 Pesos " is in 2 genres
"JLG/JLG - autoportrait de décembre " is in 2 genres
"I Can't Sleep (J'ai pas sommeil) " is in 2 genres
"Machine, The " is in 2 genres
"Hotel de Love " is in 2 genres
"Second Jungle Book: Mowgli & Baloo, The " is in 2 genres
"Roseanna's Grave (For Roseanna) " is in 2 genres
"Stag " is in 2 genres
"Picture Bride " is in 2 genres
"Caro Diario (Dear Diary) " is in 2 genres
"When Night Is Falling " is in 2 genres
"Swan Princess, The " is in 2 genres
"Barbarella " is in 2 genres
"Land Before Time III: The Time of the Great Giving (199" is in 2 genres
"Next Karate Kid, The " is in 2 genres
"No Escape " is in 2 genres
"Highlander III: The Sorcerer " is in 2 genres
"Suture " is in 2 genres
"Walking Dead, The " is in 2 genres
"I Like It Like That " is in 3 genres
"I'll Do Anything " is in 2 genres
"Grace of My Heart " is in 2 genres
"Sliding Doors " is in 2 genres
"Men of Means " is in 2 genres
"Mr. Jones " is in 2 genres
"Jason's Lyric " is in 2 genres
"Moonlight and Valentino " is in 2 genres
"That Darn Cat! " is in 3 genres
"Golden Earrings " is in 2 genres
"Lady of Burlesque " is in 2 genres
"Angel on My Shoulder " is in 2 genres
"Beat the Devil " is in 2 genres
"Love Is All There Is " is in 2 genres
"Damsel in Distress, A " is in 3 genres
"Sleepover " is in 2 genres
"Thieves (Voleurs, Les) " is in 3 genres
"Last Summer in the Hamptons " is in 2 genres
"Tom and Huck " is in 2 genres
"Gumby: The Movie " is in 2 genres
"Visitors, The (Visiteurs, Les) " is in 2 genres
"Little Princess, The " is in 2 genres
"Nina Takes a Lover " is in 2 genres
"Bhaji on the Beach " is in 2 genres
"Nightwatch " is in 2 genres
"Dead Presidents " is in 3 genres
"Herbie Rides Again " is in 3 genres
"Man in the Iron Mask, The " is in 3 genres
"Jerky Boys, The " is in 2 genres
"Colonel Chabert, Le " is in 3 genres
"Even Cowgirls Get the Blues " is in 2 genres
"Tough and Deadly " is in 3 genres
"Carpool " is in 2 genres
"Naked in New York " is in 2 genres
"Gold Diggers: The Secret of Bear Mountain " is in 2 genres
"Killer: A Journal of Murder " is in 2 genres
"Babysitter, The " is in 2 genres
"Wings of Courage " is in 2 genres
"New Jersey Drive " is in 2 genres
"Mr. Wonderful " is in 2 genres
"Good Man in Africa, A " is in 2 genres
"Object of My Affection, The " is in 2 genres
"Witness " is in 3 genres
"Far From Home: The Adventures of Yellow Dog " is in 2 genres
"Twin Town " is in 2 genres
"Amazing Panda Adventure, The " is in 2 genres
"Frankie Starlight " is in 2 genres
"The Courtyard " is in 2 genres
"Underneath, The " is in 2 genres
"Secret Adventures of Tom Thumb, The " is in 2 genres
"Condition Red " is in 3 genres
"Yankee Zulu " is in 2 genres
"Hostile Intentions " is in 3 genres
"Tigrero: A Film That Was Never Made " is in 2 genres
"Vermont Is For Lovers " is in 2 genres
"Vie est belle, La (Life is Rosey) " is in 2 genres
"Lashou shentan " is in 3 genres
"Salut cousin! " is in 2 genres
"Shopping " is in 2 genres
"Nemesis 2: Nebula " is in 3 genres
"Romper Stomper " is in 2 genres
"City of Industry " is in 2 genres
"He Walked by Night " is in 3 genres
"Buddy " is in 3 genres
"Truth or Consequences, N.M. " is in 3 genres
"Tokyo Fist " is in 2 genres
"Reluctant Debutante, The " is in 2 genres
"Warriors of Virtue " is in 4 genres
"King of New York " is in 2 genres
"Nightwatch " is in 2 genres
"Nobody Loves Me (Keiner liebt mich) " is in 2 genres
"Wife, The " is in 2 genres
"Slingshot, The " is in 2 genres
"Á köldum klaka (Cold Fever) " is in 2 genres
"Normal Life " is in 2 genres
"Men With Guns " is in 2 genres
"Hana-bi " is in 3 genres
"Big One, The " is in 2 genres
"Spanish Prisoner, The " is in 2 genres
"Favor, The " is in 2 genres
"Little City " is in 2 genres
"Target " is in 2 genres
"Rough Magic " is in 2 genres
"Nothing Personal " is in 2 genres
"MURDER and murder " is in 3 genres
"Tainted " is in 2 genres
"Mirage " is in 2 genres
"B. Monkey " is in 2 genres
"Sliding Doors " is in 2 genres

6. Drop the movie where the genre is unknown - 2.5

In [29]:
moviegenre_dropUnknown = moviegenre[moviegenre.unknown!=1]
moviegenre_dropUnknown
Out[29]:
movie id movie title release date unknown Action Adventure Animation Childrens Comedy Crime ... Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
0 1 Toy Story 01-Jan-1995 0 0 0 1 1 1 0 ... 0 0 0 0 0 0 0 0 0 0
1 2 GoldenEye 01-Jan-1995 0 1 1 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
2 3 Four Rooms 01-Jan-1995 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
3 4 Get Shorty 01-Jan-1995 0 1 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
4 5 Copycat 01-Jan-1995 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 1 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1676 1678 Mat' i syn 06-Feb-1998 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1677 1679 B. Monkey 06-Feb-1998 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 1 0 0
1678 1680 Sliding Doors 01-Jan-1998 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0 0
1679 1681 You So Crazy 01-Jan-1994 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
1680 1682 Scream of Stone (Schrei aus Stein) 08-Mar-1996 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

1680 rows × 22 columns

7. Univariate plots of columns: 'rating', 'Age', 'release year', 'Gender' and 'Occupation' - 10

In [30]:
# HINT: use distplot for age and countplot for gender,ratings,occupation.
# HINT: Please refer to the below snippet to understand how to get to release year from release date. You can use str.split()
# as depicted below
# Hint : Use displot without kde for release year or line plot showing year wise count.
In [31]:
#distplot for age. kde is also shown
sns.distplot(userinfo['age'])
Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e00b3d48>
In [32]:
#count plot for gender
sns.countplot(userinfo['gender'])
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e0ba1a48>
In [46]:
#countplot for occupation
fig = plot.gcf()
fig.set_size_inches( 25, 50)
sns.countplot( userinfo['occupation'])
Out[46]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e20a3f48>
In [34]:
#countplot for ratings
sns.countplot( userrating['rating'])
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e0cf4b08>
In [35]:
a = 'My*cat*is*brown'
print(a.split('*')[3])

#similarly, the release year needs to be taken out from release date

#also you can simply slice existing string to get the desired data, if we want to take out the colour of the cat

print(a[10:])
print(a[-5:])
brown
brown
brown
In [36]:
#distplot for release date
moviegenre['release year'] = moviegenre['release date'].str.split('-').str[-1]
sns.distplot(moviegenre['release year'], kde=False)
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e0069608>
In [37]:
#lineplot of years of movies
#creating a year and corresponding releases count dataframe
moviegenre['release year'].value_counts()
moviegenre_count = moviegenre['release year'].value_counts().rename_axis('year').reset_index(name='count')
fig = plot.gcf()
fig.set_size_inches( 50, 10)
sns.lineplot(x='year', y='count',data=moviegenre_count)
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e0fbe908>
In [38]:
moviegenre
moviegenre.iloc[0]['release year']
moviegenre.groupby(['release year', 'Action']).sum()
moviegenre.groupby(['release year', 'Comedy']).sum()
moviegenre.groupby(['release year', 'Adventure']).sum()
moviegenre.groupby(['release year', 'Animation']).sum()
Out[38]:
movie id unknown Action Adventure Childrens Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
release year Animation
1922 0 675 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1926 0 1542 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1930 0 617 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1931 0 656 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
1932 0 1124 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1996 0 317879 0 44 23 16 104 21 18 170 5 1 11 6 6 38 14 46 9 2
1 4416 0 0 1 5 4 0 0 0 1 0 0 3 0 0 1 1 0 0
1997 0 251129 0 46 19 19 86 30 6 113 4 2 8 2 18 51 13 54 14 0
1 2520 0 0 1 3 1 0 0 0 0 0 0 3 0 0 0 0 0 0
1998 0 75341 0 12 3 1 13 7 3 33 1 2 4 1 3 11 5 18 0 0

91 rows × 19 columns

8. Visualize how popularity of genres has changed over the years - 10

Note that you need to use the number of releases in a year as a parameter of popularity of a genre

Hint

1: you need to reach to a data frame where the release year is the index and the genre is the column names (one cell shows the number of release in a year in one genre) or vice versa. Once that is achieved, you can either use univariate plots or can use the heatmap to visualise all the changes over the years in one go.

Hint 2: Use groupby on the relevant column and use sum() on the same to find out the nuumber of releases in a year/genre.

In [39]:
moviegenre.groupby('release year').sum()
Out[39]:
movie id unknown Action Adventure Animation Childrens Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
release year
1922 675 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1926 1542 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1930 617 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
1931 656 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
1932 1124 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1994 189000 0 30 13 4 15 82 8 9 97 3 0 8 2 2 35 7 25 7 6
1995 183514 0 40 22 6 21 63 11 5 89 3 1 14 3 5 37 15 39 5 2
1996 322295 0 44 24 9 21 108 21 18 170 6 1 11 9 6 38 15 47 9 2
1997 253649 0 46 20 3 22 87 30 6 113 4 2 8 5 18 51 13 54 14 0
1998 75341 0 12 3 0 1 13 7 3 33 1 2 4 1 3 11 5 18 0 0

71 rows × 20 columns

In [40]:
#genre_year_df = pd.DataFrame(genre_count_per_year,index=years) 
fig = plot.gcf()
fig.set_size_inches( 25, 71)
mpdf = moviegenre.drop('movie id',axis=1)
sns.heatmap(mpdf.groupby('release year').sum(),annot=True)
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x1c3e0c33288>
In [ ]:
 

9. Find the top 25 movies according to average ratings such that each movie has number of ratings more than 100 - 10

Hint :

  1. First find the movies that have more than 100 ratings(use merge, groupby and count). Extract the movie titles in a list.
  2. Find the average rating of all the movies and sort them in the descending order. You will have to use the .merge() function to reach to a data set through which you can get the names and the average rating.
  3. Use isin(list obtained from 1) to filter out the movies which have more than 100 ratings.

Note: This question will need you to research about groupby and apply your findings. You can find more on groupby on https://realpython.com/pandas-groupby/.

In [41]:
#your answer here
userating_movie_id = userrating.groupby('movie id').count() 
#all records with more than 100 ratings
userating_more_than_100 = userating_movie_id[userating_movie_id['user id'] > 100]

#finding out  the average of movies ratings
userrating_movie_rating = userrating.groupby('movie id').mean()
#selecting only movies with more than 100 listings
userating_more_than_100 = pd.merge(userating_more_than_100 ,userrating_movie_rating, how='inner', on= 'movie id')
userating_more_than_100 = userating_more_than_100.sort_values('rating_y', ascending=False)
movies_list = pd.merge(userating_more_than_100, moviegenre, how='inner',on='movie id')
movies_list.rename(columns={'rating_y':'rating'}, inplace=True)
movies_list_final = movies_list[['movie id','rating', 'movie title']]
#first 25 movies with highest average rating and more than 100 ratings
movies_list_final.head(25)
Out[41]:
movie id rating movie title
0 408 4.491071 Close Shave, A
1 318 4.466443 Schindler's List
2 169 4.466102 Wrong Trousers, The
3 483 4.456790 Casablanca
4 64 4.445230 Shawshank Redemption, The
5 603 4.387560 Rear Window
6 12 4.385768 Usual Suspects, The
7 50 4.358491 Star Wars
8 178 4.344000 12 Angry Men
9 134 4.292929 Citizen Kane
10 427 4.292237 To Kill a Mockingbird
11 357 4.291667 One Flew Over the Cuckoo's Nest
12 98 4.289744 Silence of the Lambs, The
13 480 4.284916 North by Northwest
14 127 4.283293 Godfather, The
15 285 4.265432 Secrets & Lies
16 272 4.262626 Good Will Hunting
17 657 4.259542 Manchurian Candidate, The
18 474 4.252577 Dr. Strangelove or: How I Learned to Stop Worr...
19 174 4.252381 Raiders of the Lost Ark
20 479 4.251397 Vertigo
21 313 4.245714 Titanic
22 511 4.231214 Lawrence of Arabia
23 484 4.210145 Maltese Falcon, The
24 172 4.204360 Empire Strikes Back, The

10. See gender distribution across different genres check for the validity of the below statements - 10

  • Men watch more drama than women
  • Women watch more Sci-Fi than men
  • Men watch more Romance than women
  1. There is no need to conduct statistical tests around this. Just compare the percentages and comment on the validity of the above statements.

  2. you might want ot use the .sum(), .div() function here.

  3. Use number of ratings to validate the numbers. For example, if out of 4000 ratings received by women, 3000 are for drama, we will assume that 75% of the women watch drama.
In [42]:
######## Men watch more drama than women ######################
#finding list of drama movies
moviegenre_drama = moviegenre[moviegenre['Drama'] == 1]
dramalist = moviegenre_drama['movie id'].tolist()

#all userrating for drama genre
userrating_drama = userrating[userrating['movie id'].isin(dramalist) ]
userrating_drama_list = userrating_drama['user id'].tolist()

#all users rated for drama from user database
userinfo_drama = userinfo[userinfo['user id'].isin(userrating_drama_list)]
userinfo_drama.groupby('gender').count()
drama_percent = 670 * 100 / (670+273)
print(drama_percent)

# Here as per user rating we can conclude that man watch more drama than women. So the statement given is right
# no of men who rated drama genre => 670
# no of women who rated drama genre => 273
# percent of men seeing the drama genre = 71%
71.04984093319194
In [43]:
########## Women watch more Sci-Fi than men ############
#finding list of scifi movies
moviegenre_scifi = moviegenre[moviegenre['Sci-Fi'] == 1]
scifilist = moviegenre_scifi['movie id'].tolist()

#all userrating for scifi genre
userrating_scifi = userrating[userrating['movie id'].isin(scifilist) ]
userrating_scifi_list = userrating_scifi['user id'].tolist()

#all users rated for scifi from user database
userinfo_scifi= userinfo[userinfo['user id'].isin(userrating_scifi_list)]
userinfo_scifi.groupby('gender').count()
scifi_percent = 256 * 100 / (652+256)
print(scifi_percent)


# Here as per user rating we can conclude that Women watch less Sci-Fi than Men. So the statement given is wrong.
# no of men who rated sci-fi genre => 652
# no of women who rated sci-fi genre => 256
# no of women watching scifi genre movies = 28.19
28.19383259911894
In [44]:
############ Men watch more Romance than women #################
#finding list of romance movies
moviegenre_romance = moviegenre[moviegenre['Romance'] == 1]
romancelist = moviegenre_romance['movie id'].tolist()

#all userrating for romance genre
userrating_romance = userrating[userrating['movie id'].isin(romancelist) ]
userrating_romance_list = userrating_romance['user id'].tolist()

#all users rated for romance from user database
userinfo_romance= userinfo[userinfo['user id'].isin(userrating_romance_list)]
userinfo_romance.groupby('gender').count()
romance_percent = 670 * 100 / (670+256)
print(romance_percent)


# Here as per user rating we can conclude that Men watch more romance than women. So the statement given is right.
# no of men who rated romance genre => 670
# no of women who rated romance genre => 256
# no of men watching romance genre movies = 72.35
72.35421166306695